RNN-BLSTM Based Multi-Pitch Estimation

نویسندگان

Jianshu Zhang

Jian Tang

Li-Rong Dai

چکیده

Multi-pitch estimation is critical in many applications, including computational auditory scene analysis (CASA), speech enhancement/separation and mixed speech analysis; however, despite much effort, it remains a challenging problem. This paper uses the PEFAC algorithm to extract features and proposes the use of recurrent neural networks with bidirectional Long ShortTerm Memory (RNN-BLSTM) to model the two pitch contours of a mixture of two speech signals. Compared with feedforward deep neural networks (DNN), which are trained on static framelevel acoustic features, RNN-BLSTM is trained on sequential frame-level features and has more power to learn pitch contour temporal dynamics. The results of evaluations using a speech dataset containing mixtures of two speech signals demonstrate that RNN-BLSTM can substantially outperform DNN in multipitch estimation of mixed speech signals.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Frame Stacking in RNN-Based Tandem ASR Systems - Learned vs. Predefined Context

As phoneme recognition is known to profit from techniques that consider contextual information, neural networks applied in Tandem automatic speech recognition (ASR) systems usually employ some form of context modeling. While approaches based on multi-layer perceptrons or recurrent neural networks (RNN) are able to model a predefined amount of context by simultaneously processing a stacked seque...

متن کامل

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer

Prosodic structure generation from text plays an important role in Chinese text-to-speech (TTS) synthesis, which greatly influences the naturalness and intelligibility of the synthesized speech. This paper proposes a multi-task learning method for prosodic structure generation using bidirectional long shortterm memory (BLSTM) recurrent neural network (RNN) and structured output layer (SOL). Unl...

متن کامل

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network

Bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) has been successfully applied in many tagging tasks. BLSTM-RNN relies on the distributed representation of words, which implies that the former can be futhermore improved through learning the latter better. In this work, we propose a novel approach to learn distributed word representations by training BLSTM-RNN on a spe...

متن کامل

A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks

This paper describes the design of an acoustic language recognition system based on BLSTM that can discriminate closely related languages and dialects of the same language. We introduce a Divide-and-Conquer (D&C) method to quickly and successfully train an RNN-based multi-language classifier. Experiments compare this approach to the straightforward training of the same RNN, as well as to two wi...

متن کامل

TTS synthesis with bidirectional LSTM based recurrent neural networks

Feed-forward, Deep neural networks (DNN)-based text-tospeech (TTS) systems have been recently shown to outperform decision-tree clustered context-dependent HMM TTS systems [1, 4]. However, the long time span contextual effect in a speech utterance is still not easy to accommodate, due to the intrinsic, feed-forward nature in DNN-based modeling. Also, to synthesize a smooth speech trajectory, th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

RNN-BLSTM Based Multi-Pitch Estimation

نویسندگان

چکیده

منابع مشابه

Feature Frame Stacking in RNN-Based Tandem ASR Systems - Learned vs. Predefined Context

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network

A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks

TTS synthesis with bidirectional LSTM based recurrent neural networks

عنوان ژورنال:

اشتراک گذاری